16 research outputs found
Lifeguard: Local Health Awareness for More Accurate Failure Detection
SWIM is a peer-to-peer group membership protocol with attractive scaling and
robustness properties. However, slow message processing can cause SWIM to mark
healthy members as failed (so called false positive failure detection), despite
inclusion of a mechanism to avoid this.
We identify the properties of SWIM that lead to the problem, and propose
Lifeguard, a set of extensions to SWIM which consider that the local failure
detector module may be at fault, via the concept of local health. We evaluate
this approach in a precisely controlled environment and validate it in a
real-world scenario, showing that it drastically reduces the rate of false
positives. The false positive rate and detection time for true failures can be
reduced simultaneously, compared to the baseline levels of SWIM
Assessing the Amazon Cloud Suitability for CLARREO's Computational Needs
In this document we compare the performance of the Amazon Web Services (AWS), also known as Amazon Cloud, with the CLARREO (Climate Absolute Radiance and Refractivity Observatory) cluster and assess its suitability for computational needs of the CLARREO mission. A benchmark executable to process one month and one year of PARASOL (Polarization and Anistropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar) data was used. With the optimal AWS configuration, adequate data-processing times, comparable to the CLARREO cluster, were found. The assessment of alternatives to the CLARREO cluster continues and several options, such as a NASA-based cluster, are being considered
Supporting iteration in a heterogeneous dataflow engine,
Abstract Dataflow execution engines such as MapReduce, DryadLINQ, and PTask have enjoyed success because they simplify development for a class of important parallel applications. These systems sacrifice generality for simplicity: while many workloads are easily expressed, important idioms like iteration and recursion are difficult to express and support efficiently. We consider the problem of extending a dataflow engine to support data-dependent iteration in a heterogeneous environment, where architectural diversity introduces data migration and scheduling challenges that complicate the problem. We propose constructs that enable a dataflow engine to efficiently support data-dependent control flow in a heterogeneous environment, implement them in a prototype system called IDEA, and use them to implement a variant of optical flow, a wellstudied computer vision algorithm. Optical flow relies heavily on nested loops, making it difficult to express without explicit support for iteration. We demonstrate that IDEA enables up to 18× speedup over sequential and 32% speedup over a GPU implementation using synchronous host-based control
Dandelion: a compiler and runtime for heterogeneous systems,”
Abstract Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with very different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span a diverse array of execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F# and therefore provides an expressive data model and native language integration for user-defined functions. This enables programmers to write applications using standard high-level languages and development tools, independent of any specific execution context. Dandelion automatically and transparently distributes the data-parallel portions of a program to the available computing resources, including compute clusters for distributed execution and the CPU and GPU cores of individual compute nodes for parallel execution. To enable the automatic execution of .NET code on GPUs, Dandelion crosscompiles .NET code to CUDA kernels and uses a GPU dataflow runtime called EDGE to manage GPU execution. This paper describes the design and implementation of the Dandelion compiler and runtime, focusing on the distributed CPU and GPU implementation. We report on our evaluation of the system using a diverse set of workloads and execution contexts